Group affect refers to the subjective emotion that is evoked by an external stimulus in a group, which is an important factor that shapes group behavior and outcomes. Recognizing group affect involves identifying important individuals and salient objects among a crowd that can evoke emotions. Most of the existing methods are proposed to detect faces and objects using pre-trained detectors and summarize the results into group emotions by specific rules. However, such affective region selection mechanisms are heuristic and susceptible to imperfect faces and objects from the pre-trained detectors. Moreover, faces and objects on group-level images are often contextually relevant. There is still an open question about how important faces and objects can be interacted with. In this work, we incorporate the psychological concept called Most Important Person (MIP). It represents the most noteworthy face in the crowd and has an affective semantic meaning. We propose the Dual-branch Cross-Patch Attention Transformer (DCAT) which uses global image and MIP together as inputs. Specifically, we first learn the informative facial regions produced by the MIP and the global context separately. Then, the Cross-Patch Attention module is proposed to fuse the features of MIP and global context together to complement each other. With parameters less than 10x, the proposed DCAT outperforms state-of-the-art methods on two datasets of group valence prediction, GAF 3.0 and GroupEmoW datasets. Moreover, our proposed model can be transferred to another group affect task, group cohesion, and shows comparable results.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
变形金刚在自然语言处理方面取得了巨大的成功。由于变压器中自我发挥机制的强大能力,研究人员为各种计算机视觉任务(例如图像识别,对象检测,图像分割,姿势估计和3D重建)开发了视觉变压器。本文介绍了有关视觉变形金刚的不同建筑设计和培训技巧(包括自我监督的学习)文献的全面概述。我们的目标是为开放研究机会提供系统的审查。
translated by 谷歌翻译
越来越多的对分析体育洞察力的需求已经刺激了各种观点的生产性研究,例如健康状态监测,结果预测。在本文中,我们专注于客观地判断返回中风的返回,这仍然在转向基于转向的运动中仍未开发。通过将中风预测作为序列预测任务制定,现有工程可以解决问题,而是基于羽毛球的特征来模拟信息。为了解决这些限制,我们提出了一种新的集会进展融合,并通过两个修改的编码器 - 解码器提取器包含集会进度和参与者的集会进度和信息。此外,我们设计一个融合网络,通过调节信息依赖性和不同的位置来整合参与者的集会上下文和上下文。羽毛球数据集的广泛实验表明,Shuttlenet显着优于最先进的方法,并且还经验验证了ShuttLenet中每个组件的可行性。首先,我们为中风预测问题提供了分析场景。
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
For Prognostics and Health Management (PHM) of Lithium-ion (Li-ion) batteries, many models have been established to characterize their degradation process. The existing empirical or physical models can reveal important information regarding the degradation dynamics. However, there is no general and flexible methods to fuse the information represented by those models. Physics-Informed Neural Network (PINN) is an efficient tool to fuse empirical or physical dynamic models with data-driven models. To take full advantage of various information sources, we propose a model fusion scheme based on PINN. It is implemented by developing a semi-empirical semi-physical Partial Differential Equation (PDE) to model the degradation dynamics of Li-ion-batteries. When there is little prior knowledge about the dynamics, we leverage the data-driven Deep Hidden Physics Model (DeepHPM) to discover the underlying governing dynamic models. The uncovered dynamics information is then fused with that mined by the surrogate neural network in the PINN framework. Moreover, an uncertainty-based adaptive weighting method is employed to balance the multiple learning tasks when training the PINN. The proposed methods are verified on a public dataset of Li-ion Phosphate (LFP)/graphite batteries.
translated by 谷歌翻译
Hybrid unmanned aerial vehicles (UAVs) integrate the efficient forward flight of fixed-wing and vertical takeoff and landing (VTOL) capabilities of multicopter UAVs. This paper presents the modeling, control and simulation of a new type of hybrid micro-small UAVs, coined as lifting-wing quadcopters. The airframe orientation of the lifting wing needs to tilt a specific angle often within $ 45$ degrees, neither nearly $ 90$ nor approximately $ 0$ degrees. Compared with some convertiplane and tail-sitter UAVs, the lifting-wing quadcopter has a highly reliable structure, robust wind resistance, low cruise speed and reliable transition flight, making it potential to work fully-autonomous outdoor or some confined airspace indoor. In the modeling part, forces and moments generated by both lifting wing and rotors are considered. Based on the established model, a unified controller for the full flight phase is designed. The controller has the capability of uniformly treating the hovering and forward flight, and enables a continuous transition between two modes, depending on the velocity command. What is more, by taking rotor thrust and aerodynamic force under consideration simultaneously, a control allocation based on optimization is utilized to realize cooperative control for energy saving. Finally, comprehensive Hardware-In-the-Loop (HIL) simulations are performed to verify the advantages of the designed aircraft and the proposed controller.
translated by 谷歌翻译
Through a study of multi-gas mixture datasets, we show that in multi-component spectral analysis, the number of functional or non-functional principal components required to retain the essential information is the same as the number of independent constituents in the mixture set. Due to the mutual in-dependency among different gas molecules, near one-to-one projection from the principal component to the mixture constituent can be established, leading to a significant simplification of spectral quantification. Further, with the knowledge of the molar extinction coefficients of each constituent, a complete principal component set can be extracted from the coefficients directly, and few to none training samples are required for the learning model. Compared to other approaches, the proposed methods provide fast and accurate spectral quantification solutions with a small memory size needed.
translated by 谷歌翻译
In this paper, we propose an effective unified control law for accurately tracking agile trajectories for lifting-wing quadcopters with different installation angles, which have the capability of vertical takeoff and landing (VTOL) as well as high-speed cruise flight. First, we derive a differential flatness transform for the lifting-wing dynamics with a nonlinear model under coordinated turn condition. To increase the tracking performance on agile trajectories, the proposed controller incorporates the state and input variables calculated from differential flatness as feedforward. In particular, the jerk, the 3-order derivative of the trajectory, is converted into angular velocity as a feedforward item, which significantly improves the system bandwidth. At the same time, feedback and feedforward outputs are combined to deal with external disturbances and model mismatch. The control algorithm has been thoroughly evaluated in the outdoor flight tests, which show that it can achieve accurate trajectory tracking.
translated by 谷歌翻译
Human Activity Recognition (HAR) is one of the core research areas in mobile and wearable computing. With the application of deep learning (DL) techniques such as CNN, recognizing periodic or static activities (e.g, walking, lying, cycling, etc.) has become a well studied problem. What remains a major challenge though is the sporadic activity recognition (SAR) problem, where activities of interest tend to be non periodic, and occur less frequently when compared with the often large amount of irrelevant background activities. Recent works suggested that sequential DL models (such as LSTMs) have great potential for modeling nonperiodic behaviours, and in this paper we studied some LSTM training strategies for SAR. Specifically, we proposed two simple yet effective LSTM variants, namely delay model and inverse model, for two SAR scenarios (with and without time critical requirement). For time critical SAR, the delay model can effectively exploit predefined delay intervals (within tolerance) in form of contextual information for improved performance. For regular SAR task, the second proposed, inverse model can learn patterns from the time series in an inverse manner, which can be complementary to the forward model (i.e.,LSTM), and combining both can boost the performance. These two LSTM variants are very practical, and they can be deemed as training strategies without alteration of the LSTM fundamentals. We also studied some additional LSTM training strategies, which can further improve the accuracy. We evaluated our models on two SAR and one non-SAR datasets, and the promising results demonstrated the effectiveness of our approaches in HAR applications.
translated by 谷歌翻译